Abstract: Data analysis is an important functionality in Big Data processing and computing which allows a huge amount of data to be processed over very large clusters. Map Reduce is recognized as a popular way to handle data in the data intensive environment due to its excellent scalability and good fault tolerance features. Cost analysis shows that the user server cost still dominates the total cost of high scale data centers or cloud systems. Heterogeneous workloads are the problems in large scale data centers. Data analysis on huge datasets is processed by proposing an analysis framework for the task and resource provisioning which reduce the peak resource. Building indexes in data centres for analyzing the data. The Map reduce model uses effective processing to process the datasets in hybrid structure. It reduces the server cost on data centre. This process establishes hadoop distributed file system and parallel database for processing and indexing. An analysis framework is constructed through PSO, which incorporates parallel database and handles workloads.

Keywords: PSO, Map Reduce, Hadoop Framework, Clusters.